Extraction of Protein Sequence Motifs Information by Bi-Clustering Algorithm

نویسندگان

  • Vincent Yip
  • Bernard Chen
  • Sinan Kockara
چکیده

The activities and function of proteins can potentially be determined by protein sequence motifs. Therefore, obtaining the universally conserved and crossed protein family boundaries protein sequence motifs is crucial. In this study, a fuzzy C-means and an improved K-means clustering algorithm are applied to granulize the entire dataset and analyze each granular respectively. In addition, a modified bi-clustering algorithm is employed to improve clusters' quality. This is the first time bi-clustering algorithm is implemented for clusters extraction proposes. By comparing with the traditional shrink method, the modified bi-clustering algorithm generates more clusters with secondary structure similarity greater than 60% at the same data filtering percentage. Moreover, bi-clustering algorithm is shown to have the ability to select meaningful amino acids that biologists are interested at.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extraction of Motif Patterns from Protein Sequences Using K-Means with segment pruning methods

Bioinformatics is the application of information technology to the management of molecular biological data. Motif finding in protein sequence is one of the most crucial tasks in bioinformatics research. Motifs are identifying as overly recurring sub-patterns in segment of protein sequence biological data. Sequence motifs are verifying by their structural similarities or their functional roles i...

متن کامل

Innovative Algorithms and Evaluation Methods for Biological Motif Finding

Biological motifs are defined as overly recurring sub-patterns in biological systems. Sequence motifs and network motifs are the examples of biological motifs. Due to the wide range of applications, many algorithms and computational tools have been developed for efficient search for biological motifs. Therefore, there are more computationally derived motifs than experimentally validated motifs,...

متن کامل

Protein Sequence Motif Information Generated by Fuzzy - Hybrid Hierarchical K-means Clustering Algorithm

Recurring amino acids sequence patterns are referred to as protein sequence motifs. The recurring patterns are so important because the conserved regions have the potential to reveal the role of the protein itself. In this paper, we modify the FGK model and apply the Hybrid Hierarchical K-means (HHK) clustering algorithm, which is a hybrid combination of Agglomerative Hierarchical Clustering an...

متن کامل

New Seed Selection Technique for Protein Sequeunce Motif Identification

Bioinformatics is a field devoted to the interpretation and analysis of biological data using computational techniques. In recent years the study of bioinformatics has grown tremendously due to huge amount of biological information generated by the scientific community. Protein sequence motifs are short fragments of conserved amino acids often associated with specific function. Identifying such...

متن کامل

Repeated Record Ordering for Constrained Size Clustering

One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010